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Method and apparatus for segmenting data to create mixed raster content planes 



(57) An improved technique lor compressing a color 
or gray scale pixel map representing a document using 
an MRC lormat includes a method of segmenting an 
original pixel map into two planes (12.1 6), and then com- 
pressing the data or each plane in an efficient manner. 
The image is segmented by separating the image into 



two portions at the edges. One plane contains image 
data for the dark sides of the edges, while image data 
for the bright sides of the edges and the smooth portions 
of the image are placed on the other plane. This results 
in improved image compression ratios and enhanced 
image quality. 
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Des ription 

[0001] This invention relates generally to image 
processing and. more particularly, to techniques lor seg- 
menting, classifying and/or compressing the digital rep- s 
resentation of a document. 

[0002] Documents scanned at high resolutions re- 
quire very large amounts of storage space. Instead of 
being stored as is. the data is typically subjected to some 
form of data compression in order to reduce its volume, io 
and thereby avoid the high costs associated with storing 
it. "Lossless" compression methods such as Lempel-Ziv 
Welch (LZW) do not perform particularly well on 
scanned pixel maps. While "lossy" methods such as 
JPEG work fairly well on continuous-tone pixel maps, is 
they do not work particularly well on the parts o1 the page 
that contain text. To optimize image data compression, 
techniques, which can recognize the type of data being 
compressed, are needed. 

[0003] Known compression techniques are described so 
in US-A-5778092, US-A-5251271, US-A-5060980, US- 
A-5784175, US-A-530331 3 and US-A-5432870. 
[0004] In one embodiment, the present invention dis- 
closes a method of segmenting a pixel map represen- 
tation of a document which includes the steps of : acquir- 2S 
ing a block of the digital image data, wherein the digital 
image data is composed of light intensity signals in dis- 
crete locations; designating a classification for the block 
and providing an indication about a context of the block; 
segmenting the light intensity signals in the block into 30 
an upper subset and a lower subset based upon the des- 
ignated classification; generating a selector set which 
tracks the light intensity segmentation; and separately 
compressing the digital image data contained in the up- 
per and lower subsets. 55 
[0005] In another embodiment, the present invention 
discloses a method of classifying a block of digital image 
data into one of a plurality of image data types, wherein 
the block of data is composed of light intensity signals 
in discrete locations : which includes: dividing the block 40 
into a bright region and a dark region; dividing a low pass 
filtered version of the block into a bright region and a 
dark region; calculating average light intensity values for 
each of the bright region, the dark region, the filtered 
bright region and the filtered dark region; and comparing 
a difference between the bright region and the dark re- 
gion average light intensity values to a filtered difference 
belween the bright region and the dark region average 
filtered light intensity values; if the average light intensity 
difference and the average filtered light intensity differ- 
nce are approximately equal finding a range of values 
in which the difference value falls, and classifying the 
block based upon the value rang ; and if the av rage 
light int nsity diff erenc and the av rage filter d light in- 
tensity difference are not approximately equal finding a 
range of values in which the filt red difference value falls 
and classifying the block based upon the filtered value 
range. 



[0006] Some examples of methods according to th 
present invention will now be described with reference 
to the accompanying drawings, in which:- 

Figure 1 illustrates a composite image and includes 
an example of how such an image may be decom- 
posed into three MRC image planes- an upper 
plane, a lower plane, and a selector plane; 
Figure 2 contains a detailed view of a pixel map and 
the manner in which pixels are grouped to form 
blocks; 

Figure 3 contains a flow chart which illustrates gen- 
erally, the steps performed to practice the invention; 
Figure 4 contains a detailed illustration of the man- 
ner in which blocks may be classified according to 
the present invention; 

Figure 5 contains a detailed illustration of the man- 
ner in which blocks may be segmented based upon 
their classification according to the present inven- 
tion; 

Figure 6 contains the details of one embodiment of 
the manner in which block variation can be meas- 
ured as required by the embodiment of the invention 
shown in Figure 4; 

Figure 7 contains the details of an embodiment of 
the invention describing classification of blocks 
based upon the block variation measurement pro- 
vided in Figure 6; 

Figure 8 contains the details of an embodiment of 
the invention for which context may be updated 
based upon the block classification provided in Fig- 
ure 7; and, 

Figure 9 contains the details of another embodi- 
ment of the invention for updating context based up- 
on block classification as provided in Figure 7. 

[0007] The present invention is directed to a method 
and apparatus for separately processing the various 
types of data contained in a composite image. While the 
invention will described in a Mixed Raster Content 
(MRC) technique, it may be adapted for use with other 
methods and apparatus 1 and is not therefore, limited to 
a MRC format. The technique described herein is suit- 
able for use in various devices required for storing or 
transmitting documents such as facsimile devices, im- 
age storage devices and the like, and processing of both 
color and grayscale black and white images are possi- 
ble. 

[0008] A pixel map is one in which each discrete lo- 
cation on the page contains a picture element or °prxel D 
that emits a light signal with a value that indicates the 
color or, in the case of gray scale documents, how light 
or dark the image is at that location. As those skilled in 
the art will appreciat , most pixel maps have values that 
are taken from a set of discrete, non-negative integers. 
[0009] For example, in a pix I map for a color docu- 
ment, individual separations are often represented as 
digital values, often in the range 0 to 255. where 0 rep- 
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resents no colorant (i.e. when CMYK separations are 
used) or the lowest value in the range when lummanc - 
chrominance separations are used, and 255 represents 
the maximum amount of colorant or the highest value in 
the range. In a gray-scale pixel map this typically trans- 
lates to pixel values which range Irom 0. lor black, to 
255 for the whitest tone possible. The pixel maps of 
concern in the currently preferred embodiment of the 
present invention are representations of "scanned im- 
ages That is, images which are created by digitiz.ng 
light reflected off of physical media using a digital scan- 
ner The term bitmap is used to mean a binary pixel map 
in which pixels can take one of two values. 1 or 0. 
10010] Turning now to the drawings for a more de- 
tailed description of the MRC formal, pixel map 10 rep- 
resenting a color or gray-scale document is preferably 
decomposed into a three plane page format as indicated 
in Figure 1. Pixels on pixel map 10 are preferably 
grouped in blocks 18 (best illustrated in Figure 2). to al- 
low lor beller image processing efficiency. The docu- 
m nt format is typically comprised of an upper plane 1 2, 
a lower plane 14. and a selector plane 16. Upper plane 
12 and lower plane 14 contain pixels that descr.be the 
original image data, wherein pixels in each block 18 
have boon separated based upon pre-defined criteria. 
For example, pixels that have values above a certain 
Ihreshold may be placed on one plane, while those with 
values thai are equal to or below the threshold are 
placed on the other plane. Selector plane 16 keeps track 
of every pixel in original pixel map 1 0 and maps all pixels 
to an exact spot on either upper plane 12 or lower plane 
14 

[0011] The upper and lower planes are stored at the 
same bit depth and number of colors as the original pixel 
map 10 but possibly at reduced resolution. Selector 
plane 16 is created and stored as a bitmap. It is impor- 
tant to recognize that while the terms "upper' and 'low- 
er" are used to describe the planes on which data re- 
sides, it is not intended to limit the invention to any par- 
ticular arrangement or configuration. 
[0012] After processing, all three planes are com- 
pressed using a method suitable for the type of data re- 
siding thereon. For example, upper plane 12 and lower 
plane 14 may be compressed and stored using a lossy 
compression technique such as JPEG, while selector 
plane 16 is compressed and stored using a lossless 
compression format such as gzipor CCITT-G4. It would 
be apparent to one of skill in the art to compress and 
store the planes using other formats that are suitable for 
the intended use of the output document. For example, 
in the Color Facsimile arena, group 4 (MMR) would pref- 
erably be used for selector plane 1 6, since the particular 
compression format used must be one of the approved 
formats (MMR, MR. MH. JPEG, JBIG, etc.) for facsimile 
data transmission. 

[Q*)13] In the present invention digital imag data is 
pr ferably proc ssed using a MRC technique such as 
described above. Pixel map 10 repres nts a scanned 



image composed of light int nsity signals disp rs d 
throughout the separation at discret locations. Again, 
a light signal is emitted from each of these discrete lo- 
cations, referred to as "picture elements. 0 "pixels 0 or 
s "pels," at an intensity level which indicat s the magni- 
tude of the light being reflected from the original image 
at the corresponding location in that separation. 
[0014] In typical MRC fashion, pixel map 10 must be 
partitioned into two planes 12 and 14. Figure 3 contains 
io a schematic diagram, which outlines the overall process 
used to segment pixel map 10 into an upper plane 12 
and a lower plane 1 4 according to the present invention. 
Block 1 8 is acqu ired as indicated in step 210; and is clas- 
sified as indicated in step 220. In the preferred embod- 
is iment of the invention, block 18 will initially -be classified 
as either UNIFORM, SMOOTH, WEAK_EDGE or 
EDGE, and its context - either TEXT or PICTURE - will 
be provided. The block will then be reclassified as either 
SMOOTH or EDGE, depending upon the initial classffi- 
20 cation and the context. Next, pixels in block 18 are seg- 
mented - placed on either upper plane 1 2 or lower plane 
14 according to criteria that is most appropriate for the 
manner in which the block has been classified as indi- 
cated in step 230. This process is repeated for each 
25 block 18 in original pixel map 10 until the entire pixel 
map 10 has been processed: Upper plane 12, lower 
plane 1 4 and selector plane 1 6 are then separately com- 
pressed, using a technique that is most suitable tor the 
type of data contained on each, as indicated in step 240. 
30 [0015] Turing now to Figure 4, generally speaking, 
classif ication of blocks 1 8 into one of the tour categories 
' in step 220 as described above is preferably completed 
in three steps. First, the variation of pixel values within 
the block is determined as indicated in step 310. Block 
35 variation is best determined by using statistical meas- 
ures, which will be described in detail below with refer- 
ence to Figure 6. Blocks with large variations throughout 
are most likely to actually lie along edges of the image, 
while those containing little variations probably lie in uni- 
40 form or at least smooth areas. Measuring the variations 
within the block allows an initial classification to be as- 
signed to it as indicated in step 320. Next, image data 
within each block 1 8 is reviewed in detail to allow context 
information (i.e. whether the region is in the text or pic- 
45 ture region of the image) to be updated and any neces- 
sary block re-classifications to be performed as shown 
in step 330. The UNIFORM blocks are reclassified as 
SMOOTH, and the WEAK EDGE blocks are upgraded 
to EDGE in a TEXT context or reclassified as SMOOTH 
so in a PICTURE context. A smoothed version 20 of the 
image is also provided by applying a low pass filter to 
the pixel map 10. Smoothed image 20 is used in con- 
junction with original image data to offer additional in- 
formation during classification, and also provides un- 
55 screened data for halftone regions. 

[0016] Figure 5 contains d tails of the manner in 
which block 18 is segmented into two planes, as provid- 
ed in step 230 of Figure 3. The measurement begins by 
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first determining at step 410 wh ther the block b mg 
processed has initially been classified as an EDGE in 
step 220. II so, the values v p of each pixel in the block 
are first compared to a brightness threshold value t*. 
wherein pixels that have values equal to or above t^ are 
viewed as "bright" pixels, while those with values below 
t s are "dark 0 pixels. Segmenting EDGE blocks simply 
includes placing dark pixels on upper plane 12 as indi- 
cated in step 440, and placing bright pixels on lower 
plane 14 as indicated in step 450. If it is determined at 
step 410 that block 18 is not an EDGE, all pixels in the 
block are processed together, rather than on a pixel by 
pixel basis. Segmenting of SMOOTH (non-EDGE) pix- 
els occurs as follows: if block 1 8 is in the midst of a short 
run of blocks that have been classified as SMOOTH, 
and further, all blocks in this short run are dark (v p <t) - 
all data in the block is placed on upper plane 12. It the 
entire block 18 is substantially smooth (i.e. in a long run) 
or is bright (in a short run of bright pixels), all data in 
block 18 is placed on lower plane 14. 
[0017] Turning now to Figure 6, the details of one em- 
bodim nt of the invention wherein initial block classifi- 
cation via block variation measurement may be accom- 
plished as required by step 310 (Figure 4) are now de- 
scribed. A threshold. V which allows the block to be di- 
vided into two portions is first calculated as indicated in 
st p 510. In the preferred embodiment of the invention, 
this thr shold is obtained by performing a histogram 
analysis on the data in the block, but many standard 
methods can be used to perform this analysis. For ex- 
ample, the value that maximizes between distances of 
the criteria being used for separation or provides for 
maximum separation between the two portions of the 
block can be selected. Those skilled in the art will rec- 
ognize that other methods of choosing the best thresh- 
old are available and the invention is not limited to this 
embodiment. Block 18 is then thresholded into these 
two parts by comparing the light intensity value of each 
pixel to the selected threshold as indicated in step 
520. As before, if the pixel value v p is less than the 
threshold, the pixel is referred to as dark. If v p is greater 
than or equal to t., the pixel is bright. 
[0018] As stated earlier, a smooth version 20 of the 
image is obtained by applying a low pass filter to the 
original image data. Average values for bright and dark 
pixels are then obtained for both the original and 
smoothed sets of image data. Looking first at the bright 
pix Is, one value calculated will be v BPIXEL , the average 
value for all of the bright pixels in original pixel map 10 
(v 3 t s ) which are located in the area covered by block 
l8 P as S indicated in step 540. Another value, v BSMOOTH , 
the average value for all of the bright pixels in smoothed 
version 20 of the image which are located in the area 
covered by block 18 will also be obtained as «hown in 
st p 560. Dark valu s are calculated similarly. That is, 
Vqpixel. tne average value for all of the dark pixels in 
original pixel map 10 (v p < t s ) which are located in the 
area covered by block 18 will be obtained as shown in 



step 550, and v DSMO oth« 1ne average va,ue <or 3,1 of tne 
dark pixels in the smoothed version 20 of the image 
which are located in the area covered by block 18 will 
be obtained as in step 570. Once these average values 

s are obtained, the distances d and d 6 between brighter 
and darker averages for pixel map 10 and smoothed im- 
age 20 respectively are calculated as indicated in step 
580. That is d= V BPIXEL - V DP | XEL , and d 6 = Vbs^qotw - 
v dsmooth Since tfds is typically almost equal to 1 for 

10 contone images, the ratio of d/d 6 may be used to detect 
halftones. 

[0019] Figure 7 contains a detailed illustration of step 
320, of Figure 4, the preferred embodiment of a process 
for initially classifying blocks 18. As shown, a relative 

is comparison between d and d s is obtained as indicated 
in step 610 in order to determine whether the btock con- 
tains contone (d » d 8 ) or halftone data. Block 18 will in- 
itially be classified as one of four types: UNIFORM. 
SMOOTH, WEAK EDGE or EDGE according to the 

20 magnitude of the distance d or dg. Distance d is used to 
classify contone blocks, while distance is used for 
halftones. For contone data d, the value from pixel map 
10, is compared to value Xq as shown in step 620. 
[0020] If d is very low (i.e. d< x<j), all pixel values in 

25 the block are substantially the same and the block is 
classified as UNIFORM at step 640. II there are some- 
what small differences in pixel values in the block such 
that x 0 <d<x., as shown in step 622, the block is classified 
as SMOOTH, at step 650. If there are fairly large differ- 

30 ences in pixel values in the block and x 1 <d<x 2 at step 
624, the block will be classified as WEAK EDGE. If the 
differences in the block are very large and cPx 2 at ste P 
624, the block will be classified as an EDGE at step 670. 
[0021] If d/d 6 is not approximately equal to 1, is 

3S compared to threshold y 0 at step 630. It should be noted 
there that two different sets of thresholds are applied for 
halftones and contones. Thus, on most occasions, 
Xo 1v o< x i 1v i» and x 2 1v 2- Tne Process used to classify 
halftone blocks is similar to that used for contone data. 

40 Thus, if d s <y 0 at step 630 the block is classified as UNI- 
FORM at step 640. If yo^Wi in step 632, the block is 
classified as SMOOTH, at step 650. If y 1 <d 6 <y 2 as indi- 
cated in step 634, the block is classified as a WEAK 
EDGE at step 660. If cft^ at step 634, the block will be 

«5 classified as an edge at step 670. 

[0022] Referring now to Figures 8 and 9, the details 
for updating the context of the block will now be provid- 
ed. The context of a block is useful when the average 
between the dark and bright areas of the block is rela- 

so tively high. When this is the case, the block can classi- 
fied as an EDGE as long as its context is TEXT. The 
context is initially set equal to PICTURE. It is changed 
to TEXT if one of two rules is satisfied: (1) the block be- 
ing proc ss d is in a long run of UNIFORM blocks and 

ss th averag of the dark pixel values in the block is great- 
r than a preset brightness thr shold; or (2) the block 
has been classified as either UNIFORM, WEAK EDGE, 
or EDGE, one of the top. left or right neighboring blocks 
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has a context which has been set equal to TEXT, and 
the dillerence between that neighboring block and the 
current block is smaller than a preset propagation 
threshold. 

[0023] Turning first to Figure 8, determining whether 
block context should be changed according to the first 
rule requires finding a run of blocks that have been clas- 
silied as UNIFORM as indicated in step 704. Finding a 
run of UNIFORM blocks typically involves comparing 
the number ol consecutive UNIFORM blocks to a run 
length threshold t LU as indicated in step 706. The run 
length threshold sets the number of consecutive blocks 
that must be classified as UNIFORM for a run to be es- 
tablished. As also indicated in step 706, V DPIXELt the av- 
erage value ol the dark pixels for consecutive blocks is 
compared to the brightness threshold V A large number 
ol consecutive UNIFORM blocks with high brightness 
levels usually indicates that the blocks contain large 
background page areas (i.e. large white areas), thereby 
indicaling thai lexl is present. Thus, if the number of con- 
secutive UNIFORM blocks exceeds t LU and V dp , X el > 
t 6 . the context lor the block is changed to TEXT as indi- 
cat d in step 70S. 

[0024] If either the number of identified consecutive 
blocks is too small to establish a run or the blocks are 

dark (V DPIXE l £ «■>. the c° ntext win r © main set ea . ual to 
PICTURE. Whether additional runs are present in the 
block will be determined as indicated in step 710, and if 
so the process will be repeated as indicated in the illus- 
tration. 

[0025] Turning now to Figure 9, changing the context 
of a block to TEXT under the second rule first requires 
providing a propagation threshold tp. The propagation 
threshold def ines the level of brightness that will indicate 
that the block covers blank page areas. Under the sec- 
ond rule, the context will be changed from picture to text 
at step 808 if the block is not SMOOTH (i.e. is UNI- 
FROM. and EDGE or a WEAK EDGE) as shown in step 
802, either its top, left or right neighbor has a text context 
as indicated in step 804 and v BDIF , the average differ- 
ence between bright pixels in the block and bright pixels 
in the neighbor text context block is less than t p as 
shown in step 806. Neighbor blocks are checked be- 
cause presumably blocks that contain text will be locat- 
ed next to other blocks that contain text. However, the 
brightness value of the block is compared to that of its 
neighbor to assure that this is the case. In other words, 
even il Ihe block has a neighboring block with a text con- 
text, a large difference between the average brightness 
of block and its neighbor means that the block contain 
do s not contain the large blank page areas that indicate 
the presence of text. 

[0026] Again, the present invention is directed to seg- 
menting the data by first id ntifying blocks that contain 
the edges of the image and then separating the blocks 
such that those which contain the smooth data and 
bright sides of the dges are placed on the lower plane 
and the dark sides ot the edges are placed on the upper 



plane. Once each of the respective planes is generated, 
ordinary MRC processing continues. That is, each plane 
is compressed using an appropriate compression tech- 
nique. In the curr ntly preferred embodiment, upper 

s plane 12 and lower plane 14 are compressed using 
JPEG while the selector plane 16 is compressed using 
a symbol based pattern matching technique such as 
CCITT Group IV or a method of classifying scanned 
symbols into equivalence classes such as that de- 

io scribed in US-A 5,778,095 to Davies issued July 7, 
1 998, the contents of which are hereby incorporated by 
reference. The planes are then joined together and 
transmitted to an output device, such as a facsimile ma- 
chine or storage device. 
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1. A method of segmenting digital image data for 
mixed raster content processing, comprising: 

a) acquiring a block of the digital image data, 
wherein the digital image data is composed of 
light intensity signals in discrete locations; 

b) designating a classification for said block 
and providing an indication about a context of 
said block; 

c) segmenting said light intensity signals in said 
block into an upper subset and a lower subset 
based upon said designated classification; 

d) generating a selector set which tracks said 
light intensity segmentation; and 

e) separately compressing the digital image da- 
ta contained in said upper and lower subsets. 

2. A method of segmenting digital image data as 
claimed in claim 1 , wherein said classification indi- 
cates that said block contains substantially smooth 
data and/or substantially edge data. 

3. A method of segmenting digital image data as 
claimed in claim 1 or claim 2, wherein said classifi- 
cation data designating step further comprises: 

a) measuring an amount ot light intensity signal 
variation throughout said block; 

b) assigning a classification to said block based 
upon said measured light intensity signal vari- 
ation; and 

c) updating said context indication for said 
block, and designating classification for said 
block based upon said updated context. 

4. Am thod of segmenting digital image data as 
claimed in any of the preceding claims, further com- 
prising: 

a) dividing a low pass filtered version of said 
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block into a bright region and a dark region; 

b) calculating average filtered light intensity val- 
ues for said bright region and tor said-dark re- 
gion; and 

c) obtaining a difference in average filtered light 
intensity values between said bright region and 
said dark region. 

A method of segmenting a block of digital image da- 
ta into an upper and lower subset, wherein the block 
of data is composed of light intensity signals in dis- 
crete locations, comprising: 

a) determining whether the block is located on 
an edge in the digital image; 

b) it the block is on an edge, comparing a mag- 
nitude of each light intensity signal in the block 
to a brightness threshold and placing said sig- 
nal in the upper subset if said light intensity 
magnitude exceeds said brightness threshold 
or in the lower subset if said light intensity mag- 
nitude is less than said brightness threshold; 
and 

c) if the block is not located on an edge, placing 
the block in the upper subset if the block is in a 
group of blocks that have light intensity values 
which are indicative of smooth and dark image 
data, and otherwise placing the block in the low- 
er subset. 

A method of classifying a block of digital image data 
into one of a plurality of image data types, wherein 
the block of data is composed of light intensity sig- 
nals in discrete locations, comprising: 

a) dividing the block into a bright region and a 
dark region; 

b) dividing a low pass filtered version of said 
block into a bright region and a dark region; 

c) calculating average light intensity values for 
each of said bright region, said dark region, 
said filtered bright region and said filtered dark 
region; and 

d) comparing a difference between said bright 
region and said dark region average light inten- 
sity values to a filtered difference between said 
bright region and said dark region average fil- 
tered light intensity values; 

e) if said average light intensity difference and 
said average filtered light intensity difference 
are approximately equal finding a range of val^ 
ues in which said difference value falls, and 
classifying said block based upon said value 
rang ; and 

f) if said average light intensity difference and 
said average filtered light intensity difference 
are not approximately equal finding a range of 
values in which said filtered difference value 



falls and classifying said block based upon said 
filtered value range. 

A method according to any of claims 1 to 4, wherein 
blocks are classified by a method according to claim 
5 or claim 6. 
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Method and apparatus lor segmenting data to create mixed raster content planes 



(57) An improved technique for compressing a color 
or gray scale pixel map representing a document using 
an MRC format includes a method of segmenting an 
original pixel map into two planes (12.16), andthen com- 
pressing the data or each plane in an efficient manner. 
The image is segmented by separating the image into 



two portions at the edges. One plane contains image 
data for the dark sides of the edges, while image data 
for the bright sides of the edges and the smooth portions 
of the image are placed on the other plane. This results 
in improved image compression ratios and enhanced 
image quality. 
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ABSTRACT 

Compound (or mixed) document images contain graphic 
or textual content along with pictures. They are a very 
common form of documents, found in magazines, brochures, 
web-sites etc. Because of the very distinct nature of those two 
image classes (text /graphics vs. pictures), their compres- 
sion invariably involves multiple compression systems and 
a region segmentation (classification) method. We review 
state-of-the-art technologies on the subject while focusing 
our attention on the mixed raster content (MRC) multi-layer 
approach We also present new results on segmentation for 
MRC based on optimized rate-distortion-based block thresh- 
olding. 



1. INTRODUCTION 

Documents are now present in a wide spectrum of printing 
systems. From offset printers to home desktop computers, 
documents in digital form are common place. Frequently, 
documents are available as bitmaps and may contain text, 
graphics and pictures. Compound documents are images 
which contain a mix of textual, graphical, or pictorial con- 
tents Those images are invariably large but a single compres- 
sion algorithm that simultaneously meets the requirements 
for both text and image compression has been elusive. Many 
standard compression algorithms arc available today and in 
common use commercially. More are continually being devel- 
oped to improve on existing methods or to meet special re- 
quirements. As a rule, compression algorithms are developed 
with a particular image type, characteristic, and application 
in mind. For a different image type or application, a given al- 
gorithm either docs not apply or does not perform as well as 
some other, better-tailored algorithm. No single algorithm is 
best across all image types or applications. When compress- 
ing text, it is important to preserve the edges and shapes 
of characters accurately to facilitate reading. Once the text 
is binariacd, its compression is typically lossless since coding 
errors in text are easily perceived. The human visual sys- 
tem however, works differently for typical continuous-tone 
images because of the richness of patterns and frequency con- 
tents. High frequency errors are better masked and lossy 
compression is usually employed, since lossless compression 
is often ineffective in this case. In terms of image resolution, 
text requires much higher resolution than pictures. Actually, 
roughly speaking, text requires few bits per pixel but many 
pixels per inch, while pictures require many bits per pixels 
but fewer pixels per inch. 

Document compression is frequently linked to facsimile sys- 
tems, in which large document bitmaps are compressed be- 
fore transmission over telephone lines. The facsimile systems 



that most people are familiar with today are black-and-white 
(binary images) and conform to international standards set 
by the ITU-T (Telecommunication Standardization sector of 
the International Telecommunication Union, formerly known 
as the CCITT). These standards specify the protocols and 
bi-level coding procedures that sending and receiving sta- 
tions use. Together with the ubiquity of the public switched 
telephone network (PSTN), these standards have led to the 
explosive growth in Group 3 black-and-white facsimile that 
has occurred since 1980. The same convenience and ease 
of use for color facsimile requires wider use of color scan- 
ners, displays and printers; faster modems and communi- 
cation channels to handle the increased data volume; and 
equivalent standards for color facsimile. These enablers are 
already being put in place. For example, the ITU-T last year 
approved V.34 for facsimile, which supports data rates up to 
33.6 Kbps, and it is now available commercially in fax ma- 
chines. There is now a focus on new standards to provide 
color facsimile services over the PSTN and the Internet (1]. 

When it comes to compound documents, in order to cope 
with the differences between text and continuous tone images, 
different compression algorithms may be applied to each of 
the regions of the document. For that goal, some segmen- 
tation strategy has to invariably be used to discern which 
regions are to be encoded under which strategy. Another 
important parameter of a document compression system for 
compound documents is its imaging model. One can separate 
the image into different regions of interest and compress each 
region accordingly. In this case, the imaging model follows 
space segmentation where each decompressed region can be 
imaged into the document concurrently. Also, one can gener- 
ate multiple image layers, compress each one separately and 
then image all the planes into one. The multilayer model will 
be the focus of this paper. 

2. OVERVIEW 

Image compression has been very intensively studied and we 
cann ot possibly reference adequately all the most notable al- 
gorithms. However, in terms of international standards the 
notable algorithms for binary image compression are MH1 (2j , 
MMR2 |3), JBIG [4) and the forthcoming JBIG- 2 [51. Multi- 
level compression algorithm standards are JPEG |6] and the 
forthcoming J PEG- 2000 [7]. We assume that JPEG is the 
standard image compression tool while current JPEG 2000 
verification model (VM) [8] is the state-of-the-art in image 
compression, when it comes to pictorial contents. For binary 
documents, MMR2 is adequate for text. JBIG can use arith- 
metic coding for improved performance and its multiresolu- 
tion approach allows for compression of halftones. The new 
drive, however, in the compression of bi-level images is token-, 
based compression. Contiguous objects are parsed and made 
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Figure 1- Illustration of MRC imaging model. 

into entries in a dictionary. New objects are compared to the 
dictionary and if a match is found the code for that object is 
repeated. By making loose matches one allows the introduc- 
tion of losses in exchange for higher compression. This is one 
of the kev compression methods in document representation 
formats such as DigiPapcr |9] and DjVu |10],[ll]. Token- 
based compression is the heart of the forthcoming standard: 
J BIG- 2 (5l. Even halftones can be compressed with token- 
based techniques by descreening the halftone and encoding 
new halftone patterns as objects [12). For that, a segmen- 
tation needs to be performed to identify regions of graphics, 
text halftones, etc., in a binary image, in order to improve 
token-based compression in JB1G-2 |13]. Other algorithms 
do exist which can handle graphic bitmaps well [14} and 
also algorithms that perform well (not optimally) for both 
text/graphics and pictures using non-linear filter banks [15]. 

Once a region is identified it can be encoded with the 
proper algorithm. For region identification, segmentation 
algorithms may be employed. For example the algorithms 
used in DjVu and Digipaper are already in commercial appli- 
cations. Multiresolution segmentation was aplied success fully 
in [16] for document compression, while [17] does the same 
using an approximate object location, in order to simplify 
the implementation. Multiscale clustering methods are also 
effective for segmentation [18]. We will present yet another 
segmentation algorithm based on block- thresholding in which 
the thresholds are optimized in a rate-distortion sense. 

3. MIXED RASTER CONTENT 
The mixed raster content (MRC) imaging model [l],|19|,[20], 
allows for a multi-layer multi-resolution representation of a 
compound document. The basic 3-layer MRC model repre- 
sents a color image as two color-image layers (Foreground or 
FG and Background or BG) and a binary image layer (Mask). 
The Mask layer describes how to reconstruct the final image 
from the FG/BG layers, i.e. to use the corresponding pixel 
from the FG or BC layers when the mask pixel is 1 or 0, 
respectively, in that position. An illustration of the imaging 
model is shown in Fig. 1. The foreground plane is essen- 
tially poured through the mask plane onto the background 
plane. The basic 3-layer model is MRC : s most common form. 
The imaging model, however is composed of basic elementary 
plane pairs: FG+Mask. The FG layer is imaged onto a BG 
layer through the mask plane composing a new background 
image. Another foreground layer can be imaged onto this 
new background through another mask plane and the process 
can be repeated several times. The extended MRC model, 
then allows for several planes while relying on foreground- 
mask pairs. A page may be represented as one, two, three or 
more layers, depending on its content. For example, a page 
consisting of a picture could use the background layer only. 
A page containing black-and-white text could use the mask 
layer, with the foreground and background layers defaulted 
to black and to white. 
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Figure 3. Diagram of a segmenter. 

Once the original single- resolution image* is decomposed 
into layers, each layer can be processed and compressed using 
different algorithms. The image processing operations can 
include a resolution change or color mapping. Layers may 
contain different dimensions and have offsets associated with 
them. If a plane contains only a small object, the effective 
plane can be made of a bounding box around the object. The 
reduced image plane is then imaged onto the larger reference 
plane, starting from the given offset (top, left) with given 
size (width, height). This avoids representing large blank 
areas and improves compression. The compression algorithm 
and resolution used for a given layer would be matched to 
the layer's content, allowing for improved compression while 
reducing distortion visibility. The compressed layers are then 
packaged in a format, such as TIFF- FX [21] or as an ITU- 
T MRC 1 19] data stream for delivery to the decoder. At 
the decoder, each plane is retrieved, decompressed, processed 
(which might include scaling) and the image is composed 
using the MRC imaging model. 

MRC was originally approved for use in Group 3 color fax 
and is described in ITU-T Recommendation T.44. For the 
storage, archiving and general interchange of MRC-encoded 
image data, the TIFF-FX file format has been proposed |21]. 
TIFF-FX (TIFF for Fax eX tended) represents the coded data 
generated by the suite of ITU recommendations for facsimile, 
including single-compression methods MH, MR, MMR, JBIG 
and JPEG, as well as MRC. As IETF RFC 2301, TIFF- 
FX is a Proposed Internet Standard, currently undergoing 
interoperability testing. MRC has also been proposed as an 
architectural framework for JPEG 2000. 

MRC has been used in products as DigiPaper and DjVu, 
whose owners built special segmenters for them, and also 
for check compression [22]. An analysis of the goals of the 
segmentation algorithm along with a better description of 
MRC can be found in |20]. Typical segmentation strategies 
arc depicted in Fig. 2, which basically differ in whether one 
wants to move text and graphics shapes to the FG or the 
Mask plane. Since each layer (FG or BG) may contain unused 
pixels (since the pixels in that position will be selected from 
the other layer), those can be replaced by any color in order 
to enhance compression. This is the function of the pre- 
processor. The overall diagram is illustrated in Fig. 3. Given 
the pre- processors, the segmenter function is that of finding a 
binary mask for a given input, from wKich the pre-processor 
can derive the output layers based on the input image. 

In this paper, we are interested in designing the pre- 
processor and segmenter for optimized compression follow- 
ing a basic 3-layer MRC approach. For simplicity we assume 
layers have same dimensions, and the encoder for FG and 
BG layers is JPEG. For each 8x8 input pixel block the pre- 
processor receives a block of equal dimensions of binary data. 
By inspecting the binary mask, it labels the input block pix- 
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Figure 2. Illustration of typical segmentation strategies for MRC. 



as useful (U) or "don't care" (X). The X-marked pixels 
can be replaced by anything else since they are not going to 
be used for decompression. For example: 

000001 i i u u u u u ■'■ 

U U O V V 1 1 * 
OOOOllll 



o O O 0 1 1 i * 

O 0 O 1 1 1 » » 

U O I I I 1 t t 

O 1 11 I 1 II 

11111111 



UUUUUXXX 
V O O U U XX X 

vuuuxxxx 
uuuuxxxx 
uuuxxxxx 
uuxxxxxx 
uxxxxxxx 
xxxxxxxx 



The block-wise pre-proccssor we used in this paper works as 
follows If there arc 64 X-marked pixels, the block is unused 
and we output a flat- block whose pixels have the average 
o? thr P xSs block (because of JPEG's DC DPCM 6?) 
If there are no X-marked pixels the input block .s output 
untouched. If there is a mix of U- and X-marked pixels we 
foUow a multi-pass algorithm: in each pass, pixels marked 
"X" who have at least one U- marked honzontal or verticaj 
neighbour is replaced by the average of those neighbours and 
marked "IT for the next pass. The process is continued until 
there are no X-marked pels left in the block. The aim of the 
algorithm is to replace the unused parts of a block with data 
AS wUl produce a smooth block based on the existing data 
iu the TJ-marked pels. 

4. BLOCK THRESHOLDING 
Given the preprocessor just described our goal is ^to find the 
best mapping (input block to Mask block) which wJJ opU- 
mize compression in a rate-distortion (RD) sense. Rate is 
given in bits necessary to encode all 3 layers and distortion is 
given in MSE for the reconstruted block (after decompressing 
and recompiling the layers). For each block, for a fix . pre- 
processor, and without scaling, there are 2" possible Mask 
blocks Even if we fix the compression schemes, we cannot 
possibly investigate all possibilities in search for the segmen- 
tation point which yields best RD trade-off. Because of that 
we devised a simple preliminary experiment: to divide the 
mask into 16 sets of 2x2 pixels and assign each pixel in the 
2x2 set the same value. The image block is also subsampled 
by 2 and interpolated back using nearest neighbour, so that 
each 2x2 group in the block has the same intensity level. 
Now we have only 2 16 possible arrangements for the mask 
block. Sample results are shown in Fig. 4 were we plot all 
RD points for each given input block. 

A very curious issue arises when we examine a very simple 
segmentation strategy: thresholding. In this, for each block 
a threshold is selected and the mask is found as: 

maskniij) = u(a:„(t,i) - *«) 

where z„(i,i) represent the pixels at the n-th block, t„ is the 
corLpondent threshold, and u(k) is the discrete step func 
tion Since there are 64 pixels in a block, there are at most 
64 different meaningful threshold values whereby setting i„ 
to be less than the darkest pixel the Mask block can be made 
uniform (all samples im»ge.d from one of the layers). A\e then 
mark the RD points with squares in Fig. 4 which correspond 




Figure 4. Sample blocks for the simplified exper- 
iment and the corresponding 64K RD plots. RD 
points obtained by block thre sholding are marked. 

to thresholding the reduced 4x4 block. It is easily seen that 
the mask obtained using thresholding yields among the best 
RD points. Although we just have shown two examples, all 
blocks we tested showed consistent results. 

This result is not decisive but is significant. It tells us 
that if the results would hold for blocks of 8x8 pixels, then 
there is a simple way to find RD-efBcient mask blocks. Note 
that we said RD-eflicient and not optimal, since we cannot 
claim otherwise. Nevertheless, we pursue thresholding as a 
means of segmentation. The quest is to find a threshold 
value t n for the n-th block. Moreover, we want to find the 
optimal value of i„ in an RD sense. In a block there are 64 
pixels and therefore only up to G4 threshold values need to 
be tested. Given that the pre- processor algorithm is fixed 
and so are the compressors (including their parameters such 
as entropy coders and quantizers) every threshold value t*„ 
(fc-th threshold value for the n-th block) yields a set of Mask, 
BG, and FG blocks, which are compressed at a total rate Rkn 
and 1 are recombined resulting in a distortion D* n . We define 
the cost function for a block as 

where A is a Lagrange multiplier which is common to all 
blocks. It is well known that in the optimal point all blocks 
operate at the same slope on the lower convex hull of the RD 
points. We test all t*» in a block and select the one that 
minimizes J n . Two examples are shown in Fig. 5, where it is 
shown: the input block, RD points, the RD point for mini- 
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Figure 5. Sample blocks, RD plots for block thresh- 
olding and resulting Mask blocks. The operating 
slope is indicated along with the best RD point (□) 
and the RD point for a uniform mask (Q). 

mum J n , the RD point for a uniform mask (no segmentation), 
the line with slope -l/A which defines the best point and 
the resulting Mask block. One example is a two-tone block 
wherein segmentation is clearly advantageous and obvious. 
The other example is extracted from a picture. Note that a 
change in the operating point (slope of the line) may result 
in completely different segmentation. 

The main problem in our approach is to accurately com- 
pute the rate for a given block mask. The DC term in JPEG 
is encoded as a function of the DC of the previous block. That 
forced us to use a slightly greedy approach in which we decide 
the operating point for a block, calculate the masks, the pre- 
processed layers and the JPEG compressed data based on the 
previous layer blocks wliich were already set. In this sense, 
results are not globally optimal. The same reason (interblock 
dependency) affects largely the rate of the mask plane. The 
rate for the mask plane is by far the largest innacuracy of the 
algorithm. By looking at a single block we cannot compute 
how many bits some transition in that mask block would 
cost to the overall compression. Binary compression often 
works with transitions and run-lengths (or tokens in the case 
of JBIG-2). Our simple estimate, is better correlated with 
the one-dimensional MH algorithm |2] although still impre- 
cise. We simply apply a fixed penalty in bits (e.g. 7 bits) for 
every horizontal transition of the Mask layer. Globally, this 
method is a good estimator, but the hope is that it should 
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Figure 6. PSNR plots for MRC and JPEG. 

provide at least an approximation for the sake of the RD 
optimization. 

PSNR plots are shown in Fig. 6 for the image "compound 1" 
from JPEG 2000 , s test set. We compare MRC (using the 
proposed segmentation) and JPEG. The plots were obtained 
by scaling JPEG's example quantizer table (equal tables in 
both FG and BG planes) in order to vary the overall bit-rate. 
For the mask plane we used a simple fax MMR algorithm. 
The layers are then collected together using tar and gzip- 

In Fig. 6, plots are shown for different values of A, which 
is the operating slope of the segmenter. However, it is much 
more efficient to control the overall rate by modifying the 
compressors' parameters instead of making the Mask layer 
more or less complex. As A decreases, the optimization is 
more biased towards minimizing rate in exchange of distor- 
tion. Nevertheless, as A decreases the curves improve in 
Fig. 6. Two factors may contribute to this effect. Firstly, the 
innacurate calculation of the rate for the Mask layer makes it 
difficult to control the trade-off. The algorithm might chose 
to generate very complex masks since the penalty grows lin- 
early with the number of transitions. As A decreases, we 
noted that fewer portions of pictures are actually segmented. 
Secondly, the correlation of thresholding optimality and over- 
all optimality may be weaker for more complex masks. In any 
case, results for the MRC scheme are far superior to JPEG's 
in terms of PSNR and can be shown to be superior to JPEG 
2000*6 VM coder as well. A comparison of portions of an 
image encoded at about 0.4 bits-per-pixel (bpp) is shown in 
Fig. 7. It shows an MRC compressed image using: segmenta- 
tion through block tliresholding for very small A; JPEG com- 
pression for both FG and BG layers; and GCITTs MMR for 
the Mask layer. It also shows the result using JPEG and the 
actual Mask plane used for MRC. Other images and com- . 
parisons can be shown but space limitations preclude the 
presentation of more results. 

5. REMARKS 

Optimized block thresholding seems to be an effective way to 
segment a compound document image for compression. If the 
complexity is not acceptable for a given application, one can 
use this procedure to guide and train non-RD-based segmen- 
tations strategies. Results so far are not decisive. Further 
efforts will be concentrated on better methods to estimate 
the rate achieved by compressing the Mask layer and inves- 
tigating the reasons why minimization of rate is much more 
important than minimization of distortion, in the segmental 
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Fieure 7. Top: portion of a reconstructed image a£ 
tef compression using MRC ^ 0 37bpp and 35.4dB 
PSNR. Middle: same for JPEG at 0.39bpp and 
23.9dB PSNR. Bottom: mask used for segmentation. 



tion algorithm. Further results and details -will be presented 
in a forthcoming full paper [23]. 
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